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Abstract 

We report a single-copy tempering method for simulating large complex systems. In a 
generalized ensemble, the method uses runtime estimate of the thermal average energy 
computed from a novel integral identity to guide a continuous temperature-space random 
walk. We first validated the method in a two-dimensional Ising model and a Lennard- 
Jones liquid system. It was then applied to folding of three small proteins, trpzip2, trp- 
cage, and villin headpiece in explicit solvent. Within 0.5~1 microsecond, all three 
systems were folded into atomic accuracy: the alpha carbon root mean square deviations 
of the best folded conformations from the native states were 0.2A, 0.4A, and 0.4A, for 
trpzip2, trp-cage, and villin headpiece, respectively. 
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I. Introduction 

Molecular simulations at room temperature usually suffer from a slow d5aiamics 
for large complex systems, such as proteins in explicit solvent. A promising solution to 
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the problem is to use tempering methods, either single-copy based methods ' such as 
simulated tempering, or multiple-copy based methods such as parallel tempering, also 
known as replica exchange . In either case, the system regularly changes its temperature 
in a way that is consistent to the underlying thermodynamics. The value of these 
methods lies in that they can efficiently overcome energy barriers by exploiting a fast 
dynamics at higher temperatures. 

In traditional tempering methods, the temperature is a discrete random variable 
that can only assume a few predefined values. The success rate of transitions between 
two neighboring temperatures depends on the overlap of canonical energy distributions at 
the two temperatures and decays as the system size grows. Thus, to reach an optimal 
sampling efficiency for a large system, one needs to narrow down the temperature gap, 
and to increase the number of sampling temperatures. In simulated tempering, the 
number of weighting parameters to be estimated for simulation increases with the number 
of temperatures. In replica exchange, the node-node communication cost increases with 
the number of temperatures. Naturally, it is desirable to have a efficient tempering 
method that does not depend on a discrete-temperature setup. 

In this paper, we report a single-copy tempering method in which the temperature 
is a continuous variable driven by a Langevin equation. It is based on an improved 
version of a previous method ^. By employing improved estimators for thermodynamic 
quantities, one can not only realize an efficient tempering but also correctly calculate 
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thermodjoiamic quantities for the entire temperature spectrum. The essential feature of 
the method is the calculation of the thermal average energy E{P) along the simulation 

trajectory. After the convergence of E{fi) , the partition function and other 
thermodynamic quantities can be easily derived. 

The paper is organized as follows. In section II we give a theoretical derivation of 
the method. In section III, the method is verified on a two-dimensional Ising model and a 
Lennard- Jones liquid system, where the exact thermod5niamic quantities are either known 
or accurately computable. In section IV we apply the method to the folding of three 
small proteins, trpzip2, trp-cage, and villin headpiece, in exphcit solvent. The minimal 
alpha-carbon root mean square deviations of the best folded conformations from the 
native states are 0.2 A, 0.4 A, and 0.4 A, respectively (the last figure for villin headpiece 
is measured from an x-ray reference structure; and it should be 1.0 A if it is measured 
from an NMR reference structure). 

II. Method 

Our method samples the system in a continuous temperature range and calculates 
thermodynamic properties as functions of the temperature. As we shall see, a random 
walk in temperature space only requires an estimate of the average energy at the current 
temperature in order to correctly populate the desired distribution. We present an 
efficient way for estimating the average energy and use it to perform a fast sampling 
along the temperature. In addition, an adaptive averaging scheme is used to improve 
convergence in early stages. 
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This section mainly concerns the detailed description of the method. A relatively 
self-contained outline is first presented in section II. A. The rest of the section is 
organized as follows. In section II.B, we review basics of sampling in a generalized 
ensemble where the temperature is a continuous random variable. In section II. C and 
II.D, we present integral identities that help the ensemble to asymptotically reach the 
desired distribution. In section lI.E, we present an adaptive averaging scheme to 
accelerate initial convergence. 

II.A Brief outline of implementation 

The method can be implemented as follows. It concerns the simulation of a 
system in a given temperature range (y^^j^ , /3^^^ ) according to a predefined temperature 

distribution w{P) , which is usually proportional to 1/ P for a molecular system (the 
choice is explained in Appendix C). Note, we work with the reciprocal temperature 

P = ll {kgT) , where Ub is the Boltzmann constant and T is the regular temperature, P 

here is a variable that continuously changes within the temperature range. 

In each simulation step, we first change system configuration according either to 
constant temperature molecular dynamics or to Monte Carlo methods. Next, we update 

statistics about the potential energy and then compute an estimated average energy E{P) . 
The energy fluctuation, measured from the difference between the instantaneous potential 
energy E and the averaged value E{P) at the current temperature, determines the amount 
of temperature change that the system can afford to maintain the distribution of the 
generahzed ensemble. It thus can be used to drive a temperature-space random walk in a 
way that preserves the underlying thermodynamics at each individual temperature. 
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Practically, we use the following Langevin equation to guide the temperature space 
random walk, 

d{\lP) _ d(k,T) _ ^ ~ d\nw(^) ^ V2 
dt dt dp p 

where E is the current potential energy, and ^ is a Gaussian white noise that satisfies 

(^(0 • ^{t')) = S{t-t'). Note, here t is the time scale for integrating the Langevin 

equation, which does not necessarily coincide with the actual time in molecular 
dynamics. Apart fi-om the random noise, the rate of change of the temperature is 
determined by the difference between the instantaneous energy E and the average energy 

at the current temperature E{P) . Thus, Eq. (1) tends to raise the temperature when the 
instantaneous energy E rises above the average value E(P) , or to lower the temperature 
when E falls under E(P) . A derivation of Eq. (1) can be found in Appendix A. In 
implementation, p is first converted to its reciprocal k^T = l/P; then k^T is updated 
according to Eq. (1); finally the new P is computed from the reciprocal of the updated 
kgT . We use k^T instead of P as the variable of integration, because in this way, the 
magnitude from the random noise is proportional to the temperature value 
V2/ p = -JlkgT . The effect of the temperature change can be realized as a scaling of 
velocity or force in molecular dynamics. We repeat the process for each simulation step 
until the simulation ends. 

In order to interpolate thermodynamic quantities on a continuous temperature 
range, we divide the entire range evenly into many small bins (y^, , . The bin size is 
used for applying integral identities and is much smaller than the gap between 
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neighboring sampling temperatures in traditional tempering methods, such as replica 
exchange or simulated tempering. During simulation, each bin / collects separate 
statistics on the potential energy and its variance along the trajectory for states with 
P e . Statistics in different bins are later combined together to form unbiased 

estimates of the average energy. For a complex molecular system, the adaptive averaging 
in section lI.E can be used to improve the early convergence. 

For statistical efficiency, E{P) , p e. {P^, , is not calculated from the average 
energy of the bin, but instead from a large temperature window {P_,P_^) containing the 
bin as 



, ^'^ iPj) is a modulating factor to ensure an unbiased estimate, see Figure 1(b), 

and its computation is detailed in the next paragraph. The temperature window {P_,P^) 
is determined from the current temperature in such a way that is approximately at 
the center of the corresponding window (/9_,y^+) . 

The first step of computing E{P) is to determine two parameters a+ and a_ by 
solving the following two equations 




(2) 



J 



where Ay^ . = p.^^ - (5- is the bin width; Ie\ . is the average energy from the jth bin 



a^ + a_=l. 



(3) 




(4) 
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Here, and a_ are parameters defined in the equation ^{P) = {P) + {P) , whose two 
components ^XP) ^AP) are defined as 

\a_{p-p)l{Pj.,-p), P^{P-,P,n) 



U{P - A) /(A. - PjJ, P e (A.i, p.) 



and 



[ 0, otherwise 
respectively. Figure 1(a) schematically illustrates ^XP) ■> ^AP) ' ^"^^ ^(P) ■ Eq. (4), 
the inner bracket (. . .)^, is a configuration average of the energy fluctuation (aE^^^^ at a 

fixed temperature y^' . For P' e {Pj,Pj^.^), since the bin size is small, we compute the 
energy fluctuation fi-om the states collected in the bin (^p.,p.^^), and use it to 



approximate \SE . The outer angular bracket (. . ^ ^ denotes an average over 

temperature P' within the window {P_,P^) , it is computed as a sum of the energy 
fluctuation from different bins within {P_,P^) with ^{P') being the coefficient of 
combination. After the averaging, Eq. (4) is a simple linear equation of and a_ . By 
solving Eqs. (3) and (4), and a_ are determined. In a physical solution, both and 
a_ are nonnegative. If the linear equations lead to a negative value for either or a_ , 
zero is used instead. This measure ensures the robustness of our estimator and is thus 
usefiil in early stages when the energy fluctuation (aE^^^^ is unreliable. The 

determination of or a_ completely specifies ^^P) and its derivative ^KP). 
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II.B Generalized ensemble with a continuous temperature 

We start our method by constructing a generalized ensemble in which the 
temperature is a continuous variable in a given range (^^i^ , ) . To sample the 

system correctly, we also need to make sure that the configurational distribution at a 
particular p is identical to that of the canonical ensemble at the same temperature. The 
aim of the method is to correctly populate states in the generalized ensemble and to 
extract thermodynamic properties for the entire temperature range. 

First, the generalized ensemble is completely specified by an overall - 
distribution p{P). Once p{P) is given, the complete distribution of atomic 
configuration X as well as the temperature /3 is also determined: 

p(fi.X)-'-^^^^^p(fi), (7) 
where E{X) is the potential energy of configuration X, and 

^{P) = [ sxp[-y9£'(X)] dX is the canonical partition function. Eq. (7) is due to the 

J conf. 

requirement of preserving a canonical distribution at each temperature. It is easily 
verified that p{P) is recovered after we integrate p{P,X) over all configurations. 

The usefulness of the joint distribution p{fi,X) in Eq. (7) lies in that it specifies a 
temperature distribution under a fixed configuration X For a fixed configuration X, we 
can perform temperature-space sampling according to p{P,X) and replace p{P) by any 
desired temperature distribution w{l3) , which is fixed during simulation, in Eq. (7). If 
the configurational space is sampled according to the Boltzmann distribution, the 
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resulting overall temperature distribution after an infinitely long simulation trajectory, 
must be identical to the desired one w{fi) . 

However, the exact Z{P) is usually unknown in advance. We therefore use a 
modified version of Eq. (7) 

to conduct sampling in the temperature space, where an approximate partition function 
Z{P) is used in place of Z{P) . Note, if Z{p) differs from Z{fi) in Eq. (8), is no 
longer the overall temperature distribution, but only a parameter that specifies p{/3, X) , 
which is used in guiding the temperature-space sampling. The overall temperature 
distribution p{P) is calculated from integrating the joint distribution p{P,X) over 
configurations, 

p{J3)=\ P{P,X)dX. (9) 

Jconi. 

Using Eq. (8) in Eq. (9), we have 

p{P) = ^w{P). (10) 

In simulation, Z{P) is adaptively adjusted and w(yff) is fixed. Therefore, the overall 
distribution p{P) varies according to Eq. (10). Upon convergence, Z{P) Z{P) , the 
overall temperature distribution p{P) converges to the desired one w(y^) . 

Given a configuration X, as well as the joint distribution p{P,X) Eq. (8), 
sampling along the temperature can be performed by the Langevin equation Eq. (1), in 
which the estimated average energy E{fi) relates to the estimated partition function as 
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E{P) = -d \nZ(j3) I dp . One can demonstrate its correctness by solving the 
corresponding Fokker-Planck equation, see Appendix A. 

The remaining task is to make sure the convergence of the estimated partition 

function Z{P) to the correct one Z{P) . As evidenced by Eq. (10), the current overall 
temperature distribution p{P) is close to the desired one w{P) only if the estimated the 

partition function Z{P) is sufficiently accurate. It is interesting to note that the Langevin 

equation Eq. (1) does not involve the estimated partition function Z{P) itself, but its 

derivative E{P) instead. We should therefore exploit this feature and focus on a 

technique that adaptively improves the estimate, E{P) . 

II.C Unbiased estimate of a partition function 

The asymptotic convergence of the partition function requires that Z{P) 
approaches to Z{P) at any p in the range (y^min' y^max) • In implementation, we divide 
the temperature range to many narrow bins {P^, P^^^) . Thus we lower the requirement to 
that at any bin boundary p^, Z{p.) should equal to Z{P^). By choosing a small bin size, 

we can ensure that the deviation from Z{P) to Z{P) is negligible for all practical 
purposes. Further, to remove the dependence on a reference value of the partition 
function, the convergence condition is unambiguously expressed as a condition on ratios, 

Z{P,)IZ{P,,,) = Z{P,)IZ{P,,,). (11) 
Eq. (11) can be rearranged as ^, 
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d\nZ{J3) dXnZiP) 



^ dp 

= \'-dp[EiP)-{E)^ 

'e{P)-Ie) 



dp 
Pip) 



ZiP) 
Z{P) 



w{P) 



wiP) 



where we have used Eq. (10) on the second line; the inner bracket (e) denotes the 



average energy at a particular energy p ; on the last line, we convert the integral over the 
small temperature bin (p., yff.^j) to a temperature average in the same bin, as represented 



by the outer bracket (. . .). , which is formally defined as (^). = ' dp p{P)A . 

During simulation, if we adaptively enforce the right hand side of the above 
equation to be zero, i.e.. 



(12) 



the left hand side naturally vanishes as well, i.e., Z{p._^^)/ Z{p.^^) = Z{p.)/ Z{p.) . In this 
way the partition function as a function of the temperature can be obtained ^. 

We shall proceed by assuming a sufficiently small bin size and (i) E(P) being a 
constant E. within a bin i, and correspondingly, \nZ(P) varies linearly with P , and (ii) 
p{P)/w{P) can be treated as a constant. Eq. (12) is then simplified as 



E.= 



l'^-;'dp{E)^p{P)/w{p) 

t'dPpiP)/wiP) 



« E 



(13) 
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and the ratio of the partition function is estimated as 

ln[Z(;^,)/Z(/?,,,)] = ^M, 

where A^ff,. = - y^. . 

A direct implication of Eq. (13) is that one can estimate E^ from averages from 

statistics accumulated in bin i. Such an approach (which is similar to those used in the 
force averaging method is, although correct, ineffective when the bin size is small 
because the amount of statistics within a bin shrinks with the bin size. On the other hand, 
a large bin size can lead to a significant deviation from the desired temperature 

disfribution, since we assumed a constant E{P) within a bin. This dilemma can be 
resolved by using integral identities that remove the bin size dependence. 



II.D Estimators based on integral identities 

We now present a method for drawing an unbiased estimate from a large 

temperature window instead of a small temperature bin. The method removes bin size 
dependence by combining statistics from neighboring bins in a way that avoids 
systematic error. A similar technique of employing integral identities to improve 
statistics was previously used in improving statistical disfributions 

We aim at transforming the right hand side of Eq. (13) from an average over a 
single bin {fi^, J3._^^) to an average over a larger temperature range (y9_, J3_^) that encloses 
the bin. To do so, we use the following integral identity: 

(14) 



= ^i^){E)^ 
= \ldp<l>\P){E)^-\''dpm{^')., 
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where ^{fi) is a function that vanishes at the two boundaries, i.e., ^{P_) = ^{P+) = ; on 
the second line, we convert the difference between the two boundaries to an integral 



mechanics is also used. 

We choose ^{P) as a superposition of a smoothly varying ^XP) that spans over 

the entire window {P_,P^) and a localized function ^,{P) limited within the bin 
(Pi- Pi+i) ' see Figure 1 and Eqs. (5) and (6), in which and a_ are two nonnegative 
parameters that sum to unity, i.e., a_^+a_=l. Note at P = p._^^ , the sudden jump in 
(pXP) is exactly cancelled by that in (pXP) > thus we can ignore the S -functions in (pKP) 
and ^XP) i^ actual computation. 

The purpose of the decomposition of ^(P) into (^^(P) and ^,(P) is to use the 
localized function (pXP) to create an integral exactly equal to the right hand side of Eq. 
(13). In this way, the integral over the small bin is transformed to another one, but over a 
larger temperature window 



In early stages of simulation, the energy fluctuation (^AE^j^ in the second term of 



the right hand side of Eq. (15) can be unreliable for a complex system. To avoid direct 
inclusion of energy fluctuation, we choose the combination parameters and a_ in such 
a way that the fluctuation term vanishes. 



within the temperature range; the identity d(^E)/dp = -(aE^\ from statistical 




(15) 
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(16) 



Eq. (16) and a^+a_=l yield a solution of and a_ , which also determines . 
Thus we have, 



Since ^1(J3) is a constant within a single bin, it can be factored out of the integral when 
integrating each individual bin. The integral is thus converted to a sum over averages. 



A comparison with Eq. (13) shows that Eq. (18) is merely a linear combination of 
E. 's obtained from different bins. The auxiliary function ^l(^) serves as a set of 

coefficients of combination, whereas Eq. (16) ensures the asymptotical convergence. 

The above technique of extracting an estimate from an integral identity can be 
employed in computing other thermodynamic quantities, such as the average energy, heat 
capacity, and energy histogram. Thus, we are able to calculate these quantities at a 
particular temperature, even though the simulation is performed in an ensemble where the 
temperature is continuous. 

The result for the average energy is most easily obtained. Consider a limiting 
case where y^,._^j (with ^(yff) modified accordingly), Eq. (15) immediately becomes 

an unbiased estimate of the thermal average energy {E'j^ exactly at yff,. . 

Similarly, the heat capacity Cy {fl) can be calculated from the energy fluctuation 

as . The energy fluctuation at a particular temperature B is computed as 




(17) 




(18) 
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where we have substituted -(^AE for d{AE ^^jdp. 

Estimating the energy histogram requires additional care in choosing the function 
^{P) , in order to ensure that the histogram is nonnegative everywhere. It is shown in 
Appendix D that the optimal estimate of an energy histogram at temperature P is 



^ Pp{E')dE' 

tt^ PiP\E')dE'dp' , (20) 



f^'wiP') Z{P)IZ{P') exp[-(;9' - P){E + AE/2)] dp' 

where on the first line, the energy histogram h^(E,E + AE) at P is defined Irom the 
constant temperature energy distribution Pp(E) at the same temperature; on the second 
line, it is converted to an average over the joint temperature-energy p(P,E) distribution 

in the generalized ensemble, which can be measured from the temperature-energy 
histogram in simulation trajectory. Note, for simplicity, we have assumed that the energy 
bin size is small compared with the energy fluctuation of a typical Boltzmann 
distribution. Eq. (20) resembles the result from the multiple histogram method ^, and 
therefore can be treated as its counterpart in a continuous temperature ensemble. 

For a quantity whose statistics is not fully accumulated during simulation (e.g., it 
is calculated from periodically saved trajectory snapshots after simulation), we use the 
following reweighting formula to obtain its value at a particular temperature P , 

\';A oxpi-iP - P')E] ZiP')/ZiP)dp' 
A =i£, (21) 

' l\xp[-(P-p')E]Z(P')/Z(P)dp' ' 
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where exp[-(j3 - J3')E] Z(J3') /Z(J3) serves as a weighting function for borrowing 

statistics from P' to P . Note Eq. (21) is an unbiased estimator even if Z{P) contains 
error. 

II.E Adaptive averaging 

We now introduce an adaptive averaging scheme for accelerating the convergence 
in initial stages of a simulation. Since we start from zero statistics, the error in initially 

estimated E{P) can lead to a slow random walk in the temperature space. The adaptive 
averaging scheme overcomes the problem by assigning larger weights toward recent 
statistics, and thus encourages a faster random walk in the temperature space. 

Usually, for a statistical sample of size n, the average of a quantity A is calculated 

as an arithmetic mean {A) = I Sl"^ , where, S^f = ^"^^4 is the sum of ^ and S^"^ is 

the sample size n. In simulation, since statistics is collected along the trajectory, the 
sample size n increases as simulation progresses. To increase the weight toward recent 
4 's, we redefine S^f and S^"^ as 

s^f=A,r"-'+A,r"-'+-+A„, 

where / (<1) is used to gradually damp out old statistics. Such an average can be easily 
implemented as recursions. 
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However, if a constant / is used, an average derived from Eq. (22) does not 
asjmiptotically reduce its error due to that the sample size ultimately saturates to a 
fixed value 1 /(I - /) . Therefore, we use 

r = l-C^/n, (23) 
where is a numerical constant, to gradually reduce the difference between / and 1.0. 

In this way, both a fast random walk in early stages and an asymptotical convergence can 
be achieved. 

III. Numerical Results 
III.A Ising model 

As the first example, we test our method on a 32x32 Ising model, which is a 
nontrivial system with exactly-known thermodynamic properties Results from the 

alternative method of estimating E(/3) , described in Appendix B, are also included. 
Parameters common to the two methods were set to be the same. 

The temperature range was ^ e (0.35, 0.55) or T = (1.818, 2.857) , which covered 
the critical temperature T » 2.27 of the phase transition, and the temperature distribution 
was defined by a constant w(^) , or a flat- ^ histogram. The bin size for collecting 
statistics was Sj3 = 0.0002 , and thus there were 1000 bins in the entire temperature range. 
For each temperature bin (p., ^.^i) , the temperature window for applying integral 
identities was given by fi_^)=(j3i - Afi, AJ3) with Ayff = 0.02 in evaluating Eq. 
(18), thus 201 bins were included in a window (except at temperatures near the 
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boundaries, where the largest possible value of Ayff is used). The Langevin equation was 

integrated after every 100 Monte Carlo moves, with an integrating step At = 2.0 x 10 \ 
The whole simulation stopped after 10^ Monte Carlo moves per site. 

The calculated partition ftmction, average energy and heat capacity are shown in 
Figure 2. The thin solid lines and dashed lines are for the method introduced in section 
II.D (method 1), and that introduced in Appendix B (method 2), respectively. In most 
cases, both results for the partition function and the average energy coincide with exact 

Q 

results , see Figures 2(a), 2(c), and 2(d). Even for the heat capacity, which is the second 
order derivative of the partition function and is harder to compute, the deviations in both 
cases are small. This indicates that our method is unbiased and it can produce exact 
thermodynamic quantities asymptotically. 

We now show that the correct energy distribution can be reconstructed from Eq. 
(20). Since the energy levels are discrete and the energy bin size we used was equal to 
the smallest gap between energy levels, a normalized energy histogram is equivalent to 
the energy distribution. During simulation, the current potential energy was registered 
into the histogram every 100 Monte Carlo moves, thus there were roughly 10^ samples in 
the entire histogram. The reconstructed energy distributions at a few representative 
temperatures are shown in Figure 3. The temperature window for estimating the energy 
disfribution at ^ was (J3 - A^, p + Afi) with Afi = 0.02 . One can see a good agreement 
between the integral identity Eq. (20) (the thin solid line), and the exact distribution 
(thick solid line), which was computed from the exact density of states ^. For 
comparison, distributions constructed by averaging energy distributions of two adjacent 
bins are shown as dashed lines. It is apparent that the simple average yields more noisy 
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results than the integral identity Eq. (20). The reduced error is due to Eq. (20) being able 
to access statistics from a much larger temperature window than from two neighboring 
bins. 



III.B Lennard- Jones system 

As the second example, we test the method on an 864-particle Lennard- Jones 
liquid. In reduced units, the Lennard- Jones potential for a particle pair separated by 
distance r is 



where the units of energy, mass, and length are 1 .0. In the simulation, the density was 
0.8; the cutoff was 2.5; and the temperature range was /3 = (0.48,1.02), corresponding to 
T = (0.98,2.08) . We used Monte Carlo to generate configuration changes. In each step, 
a random particle was displaced randomly in each of x, y, and z directions according to a 
uniform distribution in (-0. 1,0. 1) . After a Monte Carlo step, we applied the Langevin 
equation Eq. (1) with an integration time step A? = 0.0002 . The system was simulated 
for 10^ sweeps (a sweep = a step per particle). Coordinates were saved every 10 sweeps 
for data analysis. The overall temperature distribution w{P) was proportional to 1/ P 
(the choice of optimal w{P) is discussed in Appendix C). The temperature bin for 
collecting statistics was 5fi = 0.0005 . The window size A/^ for applying the integral 
identity Eq. (18) was 0.05 . 

The simulation results for the estimated average energy (^E'j^ and E(j3) are 

shown in Figure 4(a). Due to a small bin size, the difference between the two is invisible. 
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The dots on the figures represent results from independent constant temperature 
simulations, each of which uses the same amount of simulation time. A good agreement 
between the two indicates that our method can produce exact thermodjoiamic quantities 
for the entire temperature spectrum with less simulation time. 

Furthermore, we reconstructed the radial distribution fimction g(r) at a particular 
temperature according to Eq. (21). The reconstruction was performed after simulation 
and was based on the saved coordinates along the trajectory. From Figure 4b, we see a 
good agreement between reconstructed 's and those from constant temperature 
simulations at two different temperatures = 0.5 and 1.0, or r = 2.0 and 1.0, 
respectively. 

IV Applications in folding small proteins 

In this section, we report applications of the method in folding of several small 
proteins. The method was implemented in a modified GROMACS 4.0.5 using 
AMBER force field ports with TIP3P water model In all cases, the particle-meshed 
Ewald method was used for handling long range electrostatic interaction, and the 
velocity-rescaling method was used as thermostat For constraints, we used the 
SETTLE algorithm for water molecules and the parallel LINCS algorithm for 
proteins. Since proteins drastically changed their configurations during simulations, 
dynamic load balancing was turned on when using domain decomposition. 

For configuration-space sampling under a fixed temperature, the canonical 
ensemble, i.e., the constant (N, V, T) ensemble, was used. A 10 A cutoff was used for 
Lennard- Jones interaction, electrostatic interactions and neighboring list. Since the 

21 



temperature in our method is a variable, the temperature change was realized by scaling 
the force according to F' = {fij Pq )F = {Tq/t)F , with F' and F being the scaled and the 

original force, respectively. In this way, the scaled potential energy transforms a 
canonical ensemble at To to another temperature T, for sampling in the configurational 
space. On the other hand, the thermostat temperature To, which controls the kinetic 
energy, is unaffected and can be maintained at a fixed value =480 K. Three separate 

thermostats were applied to protein, solvent and ion groups, and the coupling time Tj, for 
the thermostats was 0. 1 ps. 

The time step for molecular dynamics was 0.002 ps. The center of mass motion 
was removed every step. Trajectory snapshots were saved every 2 ps. Since the force 

and energy calculation was much more time consuming than estimating E(P) , we 
applied the Langevin equation in every molecular d5aiamics step using an integration step 
of At = 10 . The parameter in the adaptive averaging Eq. (23) was 0.1 in all cases. 

IV.A Trpzip-2 

The first system is a 12 amino acid P-hairpin tryptophan zipper, whose Protein 
Data Bank (PDB) ID code is ILEl, and its sequence is SWTWENGKWTWK A 
unique feature of this short hairpin is that its four tryptophan side chains are locked 
against each other to stabilize the structure. 

Previously, the system was intensively studied both in explicit solvent and in 
implicit solvent However, de novo folding in explicit solvent fi-om an extended 
chain, see Figure 5(a), is much more challenging. We used AMBER99SB as the force 
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field, which is a modified version of AMBER99 force field with updated (p and y/ 

torsions 

We report four simulation trajectories. All of them reached atomic accuracy 
within in a time scale of 1 |a.s. In all the cases, we used a cubic 45x45x45 box, filled 
with 2968 water molecules as well as two CI ~ ions. The temperature range was 
P e (0.20,0.41) , which corresponded to r= 293.6K~601.9K temperature range. The grid 
spacing for Fourier transform was 1.1 5 A, and the alpha parameter was 0.3123 A~^ for the 
Ewald method. In application of integral identities Eqs. (18), (19) and (20), the 
temperature windows size was 8% of temperature value, e.g., at 
P = 1 .0/ikg 500K) « 0.24 1 the temperature window is (j3 -AJ3,J3 + AJ3) , with 
Aj3 = 4% xj3x 0.010 , which can be translated to T = (480.8,520.8) K. At boundaries, 
we used the largest possible size that allows a symmetrical window. 

A typical folded structure is shown in Figure 5(b), with its root mean square 
deviations (RMSD) for alpha-carbon (Ca) and heavy atoms being 0.25 A and 1.08 A, 
respectively. The lowest Ca-RMSD and heavy atom RMSD found in the four trajectories 
are listed in Table I, in which we also list the approximate first time of stably reaching the 
atomically accurate native structure (the criteria were Ca-RMSD < 0.5 A). It is 
interesting to note that even among structures with lowest RMSDs, tryptophan side- 
chains can still adopt different conformations. For example, in trajectories 1 and 3, we 
found native-like structures with one of the tryptophan side-chain (TRP9 or TRP4) 
flipped 1 80° with respect to the native configuration. This suggests that the fi-ee energy 
change involved in flipping a TRP sidechain is small. 
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In Figure 6, Ca-RMSD, radius of gyration, instantaneous temperature and 
potential energy along trajectory are shown in (a), (b), (c) and (d), respectively, for the 
two independent simulations, trajectory 1 and 3. In trajectory 1, the folded state was 
reached within 20 ns. This suggests the possibility of fast folding. However, in other 
trajectories, it took longer for the system to reach the native structure. In trajectory 3, 
upon reaching of the native structure, the system lingered in a state of low-temperature, 
low-energy state, and small radius of gjo-ation for about 100 ns. This corresponds to the 
fact that the folded state has a lower energy than the unfolded state and thus occupies a 
larger fraction in a low temperature Boltzmann distribution. This feature serves as a 
signature of the system reaching a native structure, and can be useful in folding 
prediction where the native structure is unknown. 

Figure 7(a) shows the distribution along the Ca-RMSD at three different 
temperatures calculated from trajectory 3. The distribution demonstrates two well- 
separated peaks, corresponding to roughly-defined folded and the unfolded states, 
respectively. The average Ca-RMSD from the native structure is roughly 0.8 A and 5.5 A 
for folded and unfolded states, respectively. As the temperature increases, the first peak 
gradually diminishes, whereas the second peak dominates. 

The folding temperature can be computed by assuming a two-state model of 
folding. The folding fraction P was first calculated as the fraction of configurations with 
Ca-RMSD less 2.0 A at different temperatures. The curve P{P) as a function of 
temperature was then fitted against a two-state model in the range e (0.24,0.4) , 
equivalently T 6(300.9 K, 501.5 K), 

P = ^ , (24) 

l + exp[A^(;ff„-;5)] 
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where the values of the three parameters were determined as Pq = 0.61 , 
AE" = 38.96 kJ-mor\ and =0.350 kJ~^-mol by regression, see Figure 7(b). Here, 
is the energy difference between the folded state and the unfolded state; fi^ is the 
melting temperature. In the simplest two-state model, the folding fraction increases 
monotonically to unity as the temperature decreases to zero. Here, the maximal fraction 
was changed from 1.0 to an adjustable parameter Pq, which helped fitting the calculated 
P(y^) to the two-state model. Physically, such a modification implies the existence of 

many configurations with energy similar to that the native one but with different 
structures in our simulation trajectory. 

If the volume change during folding is ignored, the enthalpy change between the 
folded state and the unfolded state is roughly equal to the energy change A/f ~ 39.0 
kJ mor\ The entropy difference of the folded and unfolded states is 
AS = y^^AE = 1 13 J-mor\ These values are relatively small compared with the 

experimental values A/f''"'= 70.2 kJ-moF^ AS""" = 203.3 J-mol"^ ^\ However, the 
estimated folding temperature from our calculation 344 K is close to the experimental 

1 

result 345 K . From frajectory 3, which yields the highest folding fraction at 300 K 
among four trajectories, the fraction of folded states at 300 K is roughly 55%, which still 
differs significantly from the experimental value 91% . For the other three frajectories, 
the calculated fractions are even smaller. The difference between our calculation and 
experiments were likely due to insufficient sampling and/or force field inaccuracy. 

Figure 8 shows the heat capacity vs. temperature from Eq. (19). The difference 
between three independent frajectories was small, suggesting thermodjTiamic properties 
of the entire system, protein and water, reaching convergence. It is interesting that in our 
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simulation, the heat capacity is not a constant, but inversely proportional to the 
temperature as Cy « 4.2x1 oV^kJ-mor'-K"^ 

Figure 9(a) shows the joint distribution of the radius of gyration Rg (calculated 
from Ca atoms) and the Ca-RMSD. Generally the two measures are positively correlated. 
However, the smallest radius of gyration does not occur at the native structure for the 
hairpin, whose Rg is roughly around 5.8 A. A normative structure, on the other hand, had 
a smaller Rg around 4.9 A, but a larger RMSD around 3 A. Figure 9(b) shows the joint 
distribution of temperature and energy. A typical temperature fluctuation is only around 
5K, which is roughly the magnitude of temperature gap in other tempering methods based 
on a discrete temperature, such as replica exchange. On the other hand, the temperature 
window size used in our simulation was much larger. This suggests that our method 
could more efficiently use statistics to facilitate the temperature-space random walk. 

IV.B Trp-cage 

The second application is a 20 amino acid alpha helical protein, tryptophan cage 
(trp-cage) The PDB code is 1L2Y, and the amino acid sequence is 
NLYIQWLKDGGPSSGRPPPS. The system was extensively studied by experiments 

' as well as by various sampling techniques, either in explicit solvent ' or in implicit 
solvent We again used AMBER99SB as the force field 

We simulated the system in a cubic 46x46x46 A^ box, filled with 3161 water 
molecules and two CI ~ ions. The grid spacing for Fourier transform was 1 . 1 9 A, and the 
alpha parameter was 0.3123 A~\ The initial structure was an open chain see Figure 
10(a), which was constructed by bending a fully-extended chain to fit into the box. 
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We report three independent 1 \is simulation trajectories. The temperature range 
for trajectory 1 was /? e (0.24, 0.41) ,or Te (293.6 K, 501.5 K). Li trajectories 2 and 3, 

we used a larger temperature range ^ e (0.20, 0.41) , which covered a 293.6 K~601.9 K 
temperature range. The temperature bin size Sj3 was 0.0005, 0.0005 and 0.0002 
respectively. In applying integral estimators, the temperature windows size was 10%, 
10% and 8%) of temperature value for trajectories 1, 2, and 3, respectively. In all cases, 
we used the alternative estimator introduced in Appendix B. 

All three simulations independently reached atomically accurate native 
configurations. A typical folded structure in trajectory 1 is shown in Figure 10(b). The 
minimal alpha-carbon root mean square deviations (Ca-RMSDs) from the three 
frajectories were 0.43A, 0.48 A, and 0.44 A respectively. The average Ca-RMSD for the 
native structure was around 0.8 A. The lowest RMSDs for all heavy atoms were 1.34 A, 
1 .47 A and 1 .46 A, respectively. 

The Ca-RMSD, radius of gyration, instantaneous temperature and potential energy 
along frajectory are shown in panels (a), (b), (c) and (d), respectively, of Figure 11, for 
the three independent simulations. In each trajectory, there were two folding events 
reaching an atomic accuracy, e.g., for frajectory 2, the native structure was reached at 350 
ns and 560 ns. As in the trpzip2 case, the system stayed around a low-temperature and 
low-energy state for 30ns~ 100ns upon reaching the native state. For the folding speed, it 
appears that simulations 2 and 3, which used a higher roof temperature T^ax = 600K, 
tended to reach the native state sooner than simulation 1, whose roof temperature Tmax = 
500K was lower. However, in simulation 1, the system was able to stay in the native 
states longer and performed a more detailed sampling at the low temperature end. 
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Trajectory 1 yielded the largest fraction of the folded state at 300 K, 19%, which 
is still much lower than experiment value, 70% . By fitting the folding fraction, 
computed from the fraction of states with Ca-RMSD less than 2.2 A cutoff, to the two- 
state formula Eq. (24), the parameters are Pq = 0.50 , AE = 21.1kJ-mor\ and fi^ = 0.367 
kr^-mol. The enthalpy change is thus AH ~ 21 kJ-moF^ which is relatively small 
compared with the experimental values A/f^''P= 56.2 kJ-mol"^ The estimated folding 
temperature from our calculation is 328 K, which is slightly higher than the experimental 
result 315 K^l 

IV.C Villin headpiece 

Our last application was the villin headpiece, a 36 residue alpha-helical protein 
HP36. The PDB ID is 1 VII, and the amino acid sequence is 

MLSDEDFKAVFGMTRSAFANLPLWKQQNLKKEKGLF. This system was the first 
protein partially folded in explicit solvent . Recently, a high resolution x-ray structure 
of slightly modified protein HP35, PDB ID lYRF, was published . The sequence of 
HP35 is LSDEDFKAVFGMTRSAFANLPLWKQQHLKKEKGLF, in which the N- 
terminal methionine of the original sequence was chopped off and the 28* residue, an 
asparagine, was replaced by a histidine. Both sequences were studied in literature, both 

3 1 34 35 36 

by simulations " , and in experiments ' . 

However, there is a significant difference between the NMR structure of HP36 
and the x-ray structure of HP35, about 1.62 A difference in terms of Ca-RMSD. Further, 
while molecular dynamics simulations on HP35 reached an atomically accurate 
resolution in implicit solvent as well as in explicit solvent , simulations on HP36, 
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especially in explicit solvent, yielded relatively poorer results. The difference between 
the two structures could be due to 1) the intrinsic difference between HPS 5 and HPS 6, or 
2) differences in the two experimental techniques, NMR versus x-ray . 

In this simulation, we used AMBEROS as the force field, which was previously 
used to fold HPS 5 to an atomic accuracy in implicit solvent . Another reason of 
choosing AMBEROS instead of AMBER99SB was that the latter force field might 
slightly disfavor helical conformations . To reduce simulation size, we used a 
dodecahedron simulation box with edge length being 24. 1 A to accommodate the protein 
as well as SS4S water molecules, and two Cl~ ions. The volume of the box was 
5S.Sx5S.SxS7.6A =1.069x10^ The initial conformation of the protein was a fully 
extended chain, which was bent fi^om a linear chain to fit into the box, see Figure 12(a). 
The grid spacing for Fourier transform was 1 . 1 9 A, and the alpha parameter was O.S 1 2S 
A-'. 

We report two independent simulation trajectories, each of 2 \is. The temperature 
range for trajectory 1 was e (0.18, 0.41) , or T e (29S.6 K, 668.7 K); that for trajectory 

2 was y» e (0.20, 0.41) ,ot Te (29S.6 K, 601.9 K). The temperature bin size was 0.0002 
in both cases. The integral estimator introduced in Sec. II was used, and the temperature 
windows size was 8% of current temperature value. 

The alpha-carbon root mean square deviation can be calculated from the NMR 
structure as well as the x-ray structure. In both cases, the N-terminal MET and LEU as 
well as the C-terminal PHE are not included in our calculation . Due to the flexibility 
of the C-terminal and the N-terminal helix, we also calculate the Ca-RMSD for residues 
9~S2 of HPS6 as in the literature ^^'^^ denoted as RMSDcore here. Note, in terms of Ca- 
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RMSDcore, the NMR reference structure differs from the x-ray reference structure by 0.87 
A. 

The lowest Ca-RMSDs reached in the two simulation trajectories are listed in 
Table II. The lowest Ca-RMSDcore from the NMR structure are 0.72 A and 0.73 A, for 
the two frajectories respectively. These figures are smaller than those from a previous 
study, in which the lowest RMSDcore was around 1.5 A ^\ In both trajectories, the best 
folded structures are more similar to the x-ray structure than to the NMR structure. The 
average Ca-RMSD and Ca-RMSDcore from the NMR structure are 1.30 A and 1.95 A, 
respectively. In comparison, the two figures drop to 0.90 A and 1 . 10 A if the reference is 
switched to the x-ray structure. 

Table II also lists the first time for reaching the native structure. In both 
frajectories, the time (3 10 ns for frajectory 1 and 26 ns for frajectory 2) is significantly 
shorter than those in a very recent study (where the folding occurs in 5-6 f^s) 

In Figures 12(b) and 12(c), we show the superposition of typical folded structures 
to the NMR reference structure and the x-ray reference structure, respectively, from 
simulation 1 . Although generally deviations from the native structures are small, the 
position of the N-terminal helix differs appreciably from the NMR native structure 
[notice the difference in position of PHE 7 and PHE 1 1 in Figure 12(b)], whereas the 
difference is much smaller compared with the x-ray structure [PHE 7 and PHE 1 1 of the 
two structures are superimposable in Figure 12(c)]. 

Figures 13(a) and 13(b) shows the Ca-RMSDcore (the stable region residues 9~32) 
versus the Ca-RMSD (entire chain) for the NMR and x-ray reference structures, 
respectively. It is clear that the best folded structures have larger deviations from the 
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NMR structure than from the x-ray structure. Moreover, the Ca-RMSDcore is less 
consistent with the Ca-RMSD with respect to the NMR structure than to the x-ray 
structure, which can be a result of the extremely flexible N-terminal helix. In general, we 
find that the folded structures in our simulation are closer to the x-ray structure. In both 
case, we observe many metastable states around the native state, demonstrating an 
extremely rugged energy landscape. 

The folding fraction, computed according to Ca-RMSD from the x-ray structure 
and using 3.0 A as cutoff, was fitted against Eq. (24), and yielded Pq = 0.21 , 

AE = 20.9kJ-mor', and y^„= 0.349 kJ '-mol. Assuming that the volume change during 
folding is negligible, we estimate the enthalpy change AH= 21 kJ-moF^ and the folding 
temperature Tm = 345 K. Although the folding temperature agrees well with the 
experimental value 342 K, the folding enthalpy is relatively small compared to AH^'^^= 
113kJ-mor' 

V. Concluding Discussions 

In conclusion, we presented a single-copy enhanced sampling method for 
studying large complex biological systems. The method was validated in an Ising model 
as well as in a Lennard- Jones fluid, and was successfully applied to folding of three small 
proteins, trpzip2, trp-cage and villin headpiece, in explicit solvent. In all three protein 
cases, we reversibly reached atomic accuracy of the native structures within a 
microsecond. Since our method is based on a single trajectory, it is computationally less 
demanding to reach a long time scale than tempering methods based on multiple copies, 
such as the replica exchange method. 
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In terms of sampling efficiency, in addition to the advantage from using a 
continuous temperature generalize ensemble, which eases the issue of low temperature- 
fransition rate in large complex systems , a major improvement in this work is the 

employment of integral identity that efficiently estimates the average energy E{P) for 
the temperature space random walk. The integral identity draws estimates from a large 
temperature window instead of from a single bin to improve statistics. This sfrategy 
makes the method more robust and applicable to large and complex systems. 

Additionally, the adaptive averaging scheme used for refreshing statistics can 
effectively boost the temperature space random walk in early stages, when the damping 
magnitude y still differs significantly from unit. It could account for several fast folding 
(<50 ns) events we observed. However, in later stages of simulations, the boosting effect 
gradually weakens as the damping magnitude y is switched towards 1.0 (see Eq. (23)) to 
achieve an asjmiptotic convergence. 

In folding applications, the calculated values for the folding enthalpy and folding 
temperature still showed discrepancies from experimental results. This was possibly due 
to errors in the force fields ^^'^"^ as well as limited simulation time. These factors might 
lead to overpopulation of normative conformations at room temperature. This 
observation is in line with a very recent long time replica exchange simulation on trp- 
cage with an aggregated 40x l|xs, which also revealed discrepancy between experimental 
and simulation results . Even in the presence of enhanced sampling, one would desire 
to perform much longer simulations to allow much more folding events to atomic 
accuracy. In this way, one can more accurately examine population ratios of the native 
conformation to various folding intermediate and normative conformations. 
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Appendix A: Fokker-Planck equation and stationary distribution 

Though the temperature distribution of the generalized ensemble is uniquely 
specified by Eq. (8), various Langevin equations exist for sampling the distribution, e.g., 
both Eq. (1) and the one used in the previous study correctly populate the distribution. 

Before applying a Langevin equation, one needs to find a proper "potential" 
V{P) for the temperature to guide its diffusion. The potential is the negative logarithm 
of the temperature distribution, 

V{P)^-\np{P,X) 

The negative derivative of the potential naturally serves as the force of driving the 
temperature space random walk. According to Eq. (8), we have 

dp dp dp dp 

Eq. (1) is now simplified as 

djllP) _ dV{P) ^ 42 

dt dp p 

To show that Eq. (25) is correct, we introduce a new variable r = \j P = k^T and 
write down the Fokker-Planck equation that governs the distribution of r , or pij) . 



dp{t) _ d 
dt ~ dr 

_d_ 
~ dx 

2 a 5 



dV 

—p{T) 
dp 



d 

+ 



-[rV(r)] 



idV 



dz 



dr' 

(26) 



where we have used -t — = — . A stationary solution of Eq. (26) can be found by 

dr dp 

solving the following equation, 
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whose solution is readily obtained, 

Finally, we translate the distribution of t back to that of ^ according to the probability 
invariance p{/3)d/3 = p{r)dT , 

piP)dp = p(T)dT = p(T)dj3/j3^ = Pity dp . 
We thus conclude that p(P) = t^p(t) = p(P,X) , which proves that Eq. (8) is the 
stationary solution of the Fokker-Planck equation. 



Appendix B: Alternative fitting based estimator 

Here, we introduce an alternative estimator for E{P) based on linear 
extrapolation. First, we generalize Eq. (15) to the following. For any function 

fiP) = {E)^+kip-P), (27) 

where P^ = (P^ + Pi+J/ 2 and k is an arbitrary constant, we have 

E^=J-['-^dp(E)^ 

where on the second line, the linear term in Eq. (27) vanishes after the integration. Eq. 
(15) can be considered as a special case for A: = . 
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Eqs. (27) and (28) allow a properly extrapolation for the average energy {E) ^ 

from p to P^ before applying the integral identity, or the temperature averaging. If we 
assume that {e) ^ is roughly linear function of in the window {P_,P^), the slope A: of 

E^^ vs. P can be obtained from linear regression as 

k = {ApAE)J(Ap')^, (29) 

where the angular bracket (. . denotes a temperature average over (J3_,P^). 

The advantage of using Eq. (28) is that it reduces the magnitude of the second 
integral and thus makes the estimator more robust. If the relation of and /3 is 

perfectly linear, f'{/3) = d(E'^ ^jdfi - k vanishes everywhere, and thus the second 

integral yields zero. 

In practice, we use a ^{P) parameterized with a_^=a_ = l/2, and restrain the 

magnitude of the second integral within the average energy fluctuation ^(^AE^^ in the 

window to ensure stability of the estimator. 

The estimator introduced here can be thought as a first-order generalization of Eq. 

(15), and thus allows a large temperature window or application to larger systems. 

Besides, one can also apply additional constraints to the derivative to improve the 

accuracy and stability of the estimator. For example, in our case of estimating the 

average energy, the derivative k corresponds to the negative energy fluctuation as 

S{e)^I dfi = -{aE^^ ^ = -k^T^Cy (where Cy is the heat capacity), and its value can be 

restrained within a certain range. In the systems tested in this work, however, it did not 
produce significantly different estimates from the one introduced in the Sec. II.D. 
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Appendix C: Choice of the temperature distribution w{fi) 

Here, we discuss the optimal choice of w{P) . For molecular systems, we 
invariably use w{j3) « p{/3) -Xj (i basing on the following reason. The overall energy 
distribution p{E) of the generalized ensemble is a superposition of energy distributions 
of canonical ensembles from different temperatures. To ensure a fixed degree of overlap, 
the height of p{E) only needs to match that of a canonical distribution at P , whose 

average energy {E{P)'^ is roughly equal to E. The average height of a canonical 
ensemble is inversely proportional to its width .^(aE^^ , and thus, 

piE)dEoc 



(ae' 



To translate the energy distribution to the temperature space, we change variable 
from Eto P . According to the probability invariance p(E)dE = p(P)dp , and 

\dE/dj3\ = {aE^)^ , we have 

\dE/dB\ n Jc 



(ae') P 



In the last step, we have used fact that k^P^lAE^^ ^ = C , where C is the heat capacity. 

For a Lennard- Jones like system, the heat capacity C is roughly a constant, 
accordingly the optimal w{P) is proportional to 1/ P . On the other hand, our protein 
simulations show that the heat capacity of the entire system, water and protein, roughly 
follows Ccc p , see Figure 8. According to this observation, the optimal w(P) should be 
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proportional to l/ -J^ . However, in our simulations, we still used 1/ ^ as w(j3) , this 



setup slightly biases the ensemble towards the high temperature end in order to encourage 

a faster motion at higher temperature and to help overcome broken ergodicity at lower 
temperature. 

Appendix D: Integral identity for a nonnegative quantity 

For a histogram-like quantity, regular integral identities such as Eq. (15) should 
be modified to ensure the output is nonnegative. Here, we briefly sketch a technique for 
this purpose (a more detailed and general treatment is presented elsewhere ^). 

Suppose h(j3) is a nonnegative quantity, and we are interested in estimating its 

value at a particular temperature P* . Instead of applying an integral identity to h(/^) 
itself, we introduce a smooth modulating function / {/^) , with / (y^*) = 1 , and apply the 
identity to the product h(P) f {P) , 



Kp*) = h(p*)f(p') = l\h(p)f(p)] ^'(p)dp + l\h(p)f(p)]'^(p)dp . (30) 



the second term on the right hand side of Eq. (30) vanishes because [h(P) f (P)]' = 
ever5rwhere. Thus, the estimated value is always nonnegative given that (p'(P)> . 

rP+ 

^'(P) satisfies the normalization condition ^'(P)dp = 1 , and should be proportional to 
w{P)l f (P) to minimize the error. Thus, the general expression for h{P*) is 



If we choose 




(31) 




l^w(P)h(P)dp 



(32) 
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In the case of energy histogram, h{P) can defined as an integral of the canonical 
energy distribution at fi over a small energy bin AE , 



^ cE.AE g(E')cxpi-j3E') 
Z(^) 

/ fE+AE \ I 



where g{E') is the density of states; on the second line, we have used the distribution 

exp(-j3E') 

function of the generalized ensemble p^(E') = w{P)— — - — - — — — -. 

It can be straightforwardly verified that 

f/"" E'giE')expi-pE')dE' 



[\nhm' = -[lnZiP)] 



f/"" g(E')cxp(-j3E')dE' 
«-[lnZm'-iE + AE/2), 

where, on the last line, we assumed that the energy bin size AE is sufficiently small such 
that a typical energy distribution does not vary drastically within a bin. Practically, it 
only requires the bin size AE to be much smaller than the fluctuation (or width) of a 
typical energy distribution. We finally reached 

fE+AE 



piP,E')dE'dp 



f^Mm ZiPyZ(P) cxp[-iP - p')iE + AE/2)]dp 
By substituting Z(P) for Z(P) , we recover Eq. (20). 
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Figure captions 

Figure 1 (a) Schematic illustration of auxiliary functions (p^fi) , ^Xfi) ^AP) ■> which 
are used in the integral identity for estimating the thermal average energy Ei^jS) . The 
estimate computed in this way uses statistics from a large temperature window (y5_^, y^^) 
instead of a single bin and also avoids systematic bias. ^{{3) is a combination 

of a smooth function ^^{fi) and a function ^^{P) localized at {p., p._^^) . (pXP) is 
controlled by two parameters and a_ that satisfy a_^ + a_=l. (b) Schematic 
illusfration of ^l(P) and -^[{P) . <I>'XP) (shaded) spans over the whole temperature 
window while -^[{fi) is localized in {fi^, p^^^) . 

Figure 2 Thermodynamic quantities as functions of P for the 32 x 32 Ising model. 
Results from the method infroduced in section II are labeled as Method 1 (thin solid lines, 
cross for errors), and those from the method introduced in Appendix B are labeled as 
Method 2 (dashed lines, circles for errors); (a) the partition function, (b) the heat 

capacity, (c) the average energy, (d) E{P) . 

Figure 3 Reconstructed energy disfribution at a few temperatures using Eq. (20) with a 
window size A/9 = 0.02 for the 32 x 32 Ising model. For comparison, we also show the 
resulting disfributions from averaging energy distributions of two adjacent temperature 
bins, with the bin size 5fi = 0.0002 . The energy disfributions constructed from Eq. (20) 
with a larger window are more precise than those from simple averages of two adjacent 
bins. 
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Figure 4 (a) E{P) of an 864-particle Lennard- Jones system. As a comparison, the 

values of the average energy from several constant temperature simulations are shown as 
dots, (b) The reconstructed radial distribution functions of an 864-particle Lennard- Jones 
system at two selected temperatures J'= 1.0 and 2.0 are shown as points. As a 
comparison, the corresponding radial distribution functions from independent constant 
temperature simulations are shown as lines. 

Figure 5 Trpzip2: (a) the initial fully-extended conformation, (b) a typical folded 
structure from trajectory 2. The Ca-RMSD and heavy atom RMSD are 0.25 A and l.OSA, 
respectively. Gray: reference structure (PDB ID: ILEl). 

Figure 6 Trpzip2: quantities along two independent simulation frajectories. Left: the 
first 200 ns of a fast-folding frajectory (trajectory 1). Right: 2 |j,s of frajectory 3. Panels 
from top to bottom: (a) Ca-RMSD from native structure, (b) Ca radius of gyration, (c) 
temperature, (d) potential energy. 

Figure 7 Trpzip2: (a) disfribution along the RMSD from the native structure at three 
temperatures 300K, 400K and 500K, calculated from trajectory 3, (b) fraction P of the 
folded state versus the temperature. Inset: linear fitting of log(PQ/P-l) versus P 
according to the two-state model. 

Figure 8 Trpzip2: the heat capacity, computed from four independent trajectories. Bold 
solid line: empirical formula Cy « 4.2xloYr . 
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Figure 9 Trpzip2: joint distributions of (a) the radius of gyration versus Ca-RMSD, (b) 
the potential energy versus temperature. A brighter color represents a higher population 
density. 

Figure 10 Trp-cage: (a) the initial fully-extended structure, (b) a typical folded structure. 
The Ca-RMSD and the all heavy atom RMSD are 0.44 A and 1 .54 A respectively. Gray: 
reference structure (PDB ID: 1L2Y). 

Figure 11 Trp-cage: three independent trajectories (a) Ca-RMSD from native structure, 
(b) Ca radius of gyration, (c) temperature, (d) potential energy. 

Figure 12 Villin headpiece: (a) the initial fully-extended structure, (b) a typical folded 
structure compared v^^ith an NMR reference structure (gray, PDB ID: 1 VII, Ca-RMSD: 
1.15 A), (c) a typical folded structure compared with an x-ray structure (gray, PDB ID: 
1 YRF, Ca-RMSD: 0.47 A) and the N-terminal is not shown due to the sequence 
difference. 

Figure 13 Villin headpiece: joint distributions of (a) Ca-RMSD and Ca-RMSDcore (from 
residues 9~32) using the NMR structure as the reference, (b) Ca-RMSD and Ca- 
RMSDcore, using the x-ray structure as the reference. Statistics from the two trajectories 
were combined. 
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Table I. Lowest RMSDs and folding time from four independent folding frajectories of 
trpzip2. Snapshots of reaching the lowest Ca-RMSDs can differ slightly from those of 
reaching the lowest heavy atom RMSDs. 



Traj. 


Lowest Ca- 


Lowest heavy atom 


First time of stably reaching atomic 


ID 


RMSD (A) 


RMSD (A) 


accuracy of the native state (ns) 


1 


0.20 


1.00 


20 


2 


0.25 


0.84 


520 


3 


0.20 


0.88 


530 


4 


0.25 


1.31 


1080 



Table II. Lowest RMSDs in A reached in two folding trajectories of villin headpiece. 
Snapshots of reaching the lowest Ca-RMSDs can differ slightly from those of reaching 
the lowest heavy atom RMSDs. The reference structures (either the NMR or x-ray 
structure) are denoted in parentheses. 



Traj. 


RMSDcore 


RMSD 


RMSDcore 


RMSD 


First time of reaching 


ID 


(NMR) 


(NMR) 


(x-ray) 


(x-ray) 


atomic accuracy of the 












native state (ns) 


1 


0.72 


1.32 


0.30 


0.42 


310 


2 


0.73 


0.99 


0.30 


0.40 


26 
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